PRITHIVSAKTHIUR's Repositories

1000-General-Knowledge-Flashcards

1000 Flashcards ( General, Sports, Technical,Space ) 📔📔

⭐ 6 🌐 Public

3D-Printed-Or-Not-SigLIP2

3D-Printed-Or-Not-SigLIP2 is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for binary image classification. It is trained to distinguish between images of 3D printed and non-3D printed objects using the SiglipForImageClassification architecture.

⭐ 1 🌐 Public

Age-Classification-SigLIP2

Age-Classification-SigLIP2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to predict the age group of a person from an image using the SiglipForImageClassification architecture.

⭐ 3 🌐 Public

Dino: The Minimalist Multipurpose Chat System

⭐ 2 🌐 Public

AI-Art-Generator-SDXL

AUTOMATIC1111: Software for tensor operations, saving tensor data in .safetensors format. ComfyUI: UI library, possibly managing tensor data safely with *.safetensors. InvokeAI: ML platform using *.safetensors for secure tensor storage.

⭐ 7 🌐 Public

AIorNot-SigLIP2

AIorNot-SigLIP2 is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for binary image classification. It is trained to detect whether an image is generated by AI or is a real photograph using the SiglipForImageClassification architecture.

⭐ 3 🌐 Public

Airbnb-NYC-Maps

Airbnb Price in NYC ( Select Boroughs )

⭐ 6 🌐 Public

All-In-One-Downloader

yt-dlp is a feature-rich command-line audio/video downloader with support for thousands of sites. The project is a fork of youtube-dl based on the now inactive youtube-dlc.

⭐ 6 🌐 Public

Alphabet-Sign-Language-Detection

Alphabet-Sign-Language-Detection is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify images into sign language alphabet categories using the SiglipForImageClassification architecture.

⭐ 1 🌐 Public

Anime-Classification-v0.1

Anime-Classification-v1.0 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify anime-related images using the SiglipForImageClassification architecture.

⭐ 2 🌐 Public

Augmented-Waste-Classifier-SigLIP2

Augmented-Waste-Classifier-SigLIP2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224

⭐ 3 🌐 Public

Auto-Abliteration

modify a language model's behavior by abliterating its weights.

⭐ 4 🌐 Public

Aya-Vision-Ocr-vs-Qwen2VL-Ocr

Messy Handwriting OCR Comparison Between Aya-Vision-8B and Qwen2VL-OCR-2B

⭐ 3 🌐 Public

Banana Zoom an advanced image enhancement web app that lets users select regions of an image for AI-powered upscaling and detail refinement. Using Google’s (nano banana)

⭐ 4 🌐 Public

Base64-to-Image-Encode

Convert Base64 to image online

⭐ 1 🌐 Public

bellatrix-tiny3-1b-webgpu

webgpu based llm chatbot, try on chrome browsers

⭐ 2 🌐 Public

BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labeling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts.

⭐ 6 🌐 Public

Bidirectional-and-Auto-Regressive-Transformer-CNN

BART’s primary task is used to generate clean semantically coherent text from corrupted text data but it can also be used for a variety of different NLP sub-tasks like language translation, question-answering tasks, text summarization, paraphrasing, etc.

⭐ 6 🌐 Public

Bird-Species-Classifier-526

Bird-Species-Classifier-526 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224

⭐ 1 🌐 Public

Public repo for HF blog posts

⭐ 0 🌐 Public

Vision and Language Processing . [ Latex OCR, Math Parsing, Text Analogy OCR ]

⭐ 1 🌐 Public

Canopus-Realism

Realistic Image Generation, Realistic trigger works properly, better for photorealistic trigger words, close-up shots, face diffusion, male, female characters.

⭐ 8 🌐 Public

Captioner-Pro-Demo

⭐ 1 🌐 Public

3-In-1-Chatbot - GPT

⭐ 5 🌐 Public

Client-Record-CURD-OPs-Exercise

Client Record Management - CURD OPs + Blazor Web Assembly with Standalone App

⭐ 3 🌐 Public

Clipart-126-DomainNet

Clipart-126-DomainNet is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify clipart images into 126 domain categories using the SiglipForImageClassification architecture

⭐ 1 🌐 Public

Codepy-Deepthink-3B

step-by-step solutions, creative content, and logical analyses

⭐ 1 🌐 Public

Common-Voice-Gender-Detection

Speech-Emotion-Classification is a fine-tuned version of facebook/wav2vec2-base-960h for multi-class audio classification, specifically trained to detect emotions in speech. This model utilizes the Wav2Vec2ForSequenceClassification architecture to accurately classify speaker emotions from audio signals.

⭐ 1 🌐 Public

Convert-to-Onnx-Hf-Dir

Convert a Hugging Face model to ONNX & Upload Directly to Your Hf Model Repo

⭐ 1 🌐 Public

Coral-Health is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify coral reef images into two health conditions using the SiglipForImageClassification architecture.

⭐ 2 🌐 Public

A specialized optical character recognition (OCR) application built on advanced vision-language models, designed for document-level OCR, long-context understanding, and mathematical LaTeX formatting. Supports both image and video processing with multiple state-of-the-art model

⭐ 3 🌐 Public

Cosmos-x-DocScope

Understand physical common sense and generate appropriate embodied decisions. optimized for document-level optical character recognition, long-context vision-language understanding. build with hand-curated dataset for text-to-image models, providing significantly more detailed descriptions or captions of given images.

⭐ 1 🌐 Public

CUA-GUI-Operator

A Gradio-based demonstration for Computer Use Agent (CUA) tasks, supporting multiple vision-language models: Microsoft Fara-7B, ByteDance UI-TARS-1.5-7B, Hcompany Holo2-4B, and Uniphore ActIO-UI-7B. Users upload UI screenshots (e.g., desktop or app interfaces).

⭐ 1 🌐 Public

Data Boards - Visualization of various plots ( Analysis )

⭐ 2 🌐 Public

deepfake-detector-model-v1

deepfake-detector-model-v1 is a vision-language encoder model fine-tuned from siglip2-base-patch16-512 for binary deepfake image classification. It is trained to detect whether an image is real or generated using synthetic media techniques. The model uses the SiglipForImageClassification architecture.

⭐ 16 🌐 Public

Deepfake-Quality-Detection

Good and Bad Quality Deepfake Detection

⭐ 2 🌐 Public

Deepfake-vs-Real-8000

Deepfake vs Real is a dataset designed for image classification, distinguishing between deepfake and real images.

⭐ 1 🌐 Public

DeepSeek-OCR-experimental

A Gradio-powered web interface for performing advanced OCR tasks using the DeepSeek-OCR model. This experimental app leverages Hugging Face Transformers to process images for text extraction, document conversion, figure parsing, and object localization.

⭐ 2 🌐 Public

An experimental document-focused Vision-Language Model application that provides advanced document analysis, text extraction, and multimodal understanding capabilities. This application features a streamlined Gradio interface for processing both images and videos using state-of-the-art vision-language models specialized in document understanding.

⭐ 3 🌐 Public

Doc-VLMs-v2-Localization

Doc-VLMs-v2-Localization is a demo app for the Camel-Doc-OCR-062825 model, fine-tuned from Qwen2.5-VL-7B-Instruct for advanced document retrieval, extraction, and analysis. It enhances document understanding and also integrates other notable Hugging Face models.

⭐ 3 🌐 Public

A powerful multi-modal AI application that combines three state-of-the-art vision-language models for comprehensive image and video analysis. DocScope-R1 provides OCR capabilities, detailed scene understanding, and video content analysis through an intuitive Gradio interface.

⭐ 2 🌐 Public

Document-Type-Detection

Document-Type-Detection is a multi-class image classification model based on google/siglip2-base-patch16-224, trained to detect and classify types of documents from scanned or photographed images. This model is helpful for automated document sorting, OCR pipelines, and digital archiving systems.

⭐ 1 🌐 Public

dots.ocr-fix-demo

This Gradio application demonstrates the capabilities of the "dots.ocr" model, a powerful multilingual document parser.

⭐ 2 🌐 Public

drex-062225-exp (document retrieval and extraction expert) model is a specialized fine-tuned version of docscopeocr-7b-050425-exp, optimized for document retrieval, content extraction, and analysis recognition. built on top of the qwen2.5-vl architecture.

⭐ 1 🌐 Public

⭐ 6 🌐 Public

EHRM [ Electronic Health Record Management ] introduces a centralized platform for analyzing patient records, offering insights into billing amounts, demographics, prevalent diagnoses, medical conditions, consulted doctors, admission types, and medication usage.

⭐ 10 🌐 Public

The primary issue is the fragmented nature of patient records within traditional healthcare systems. These records are stored in disparate formats across various departments or facilities, which hinders comprehensive analysis and decision-making. Additionally, medical data is voluminous and heterogeneous.

⭐ 8 🌐 Public

Face-Mask-Detection

Face-Mask-Detection is a binary image classification model based on google/siglip2-base-patch16-224, trained to detect whether a person is wearing a face mask or not. This model can be used in public health monitoring, access control systems, and workplace compliance enforcement.

⭐ 1 🌐 Public

Face-Swapper | Gradio Work Space | .hf.space

⭐ 10 🌐 Public

facial-age-detection

facial-age-detection is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-512 for multi-class image classification. It is trained to detect and classify human faces into age groups ranging from early childhood to elderly adults. The model uses the SiglipForImageClassification architecture.

⭐ 1 🌐 Public

Facial-Emotion-Detection-SigLIP2

Facial-Emotion-Detection-SigLIP2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224

⭐ 6 🌐 Public

Fara-7B-GUI-Operator

A Gradio-based demonstration for the Microsoft Fara-7B model, designed as a computer use agent. Users upload UI screenshots (e.g., desktop or app interfaces), provide task instructions (e.g., "Click on the search bar"), and receive parsed actions with visualized indicators overlaid on the image.

⭐ 4 🌐 Public

Fashion-Mnist-SigLIP2

Fashion-Mnist-SigLIP2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify images into Fashion-MNIST categories using the SiglipForImageClassification architecture.

⭐ 2 🌐 Public

Fashion-Product-Usage

Fashion-Product-Usage is a vision-language model fine-tuned from google/siglip2-base-patch16-224 using the SiglipForImageClassification architecture. It classifies fashion product images based on their intended usage context.

⭐ 1 🌐 Public

FineTuning-MetaCLIP-2

This demonstrates the process of adapting a large scale pretrained model, MetaCLIP 2, for fine tuning a specific downstream task: image classification.

⭐ 2 🌐 Public

FineTuning-SigLIP-2

Fine-Tuning SigLIP 2 for Single/Multi-Label Image Classification. Image classification vision-language encoder model fine-tuned for Image Classification Tasks

⭐ 45 🌐 Public

Fire-Detection-Siglip2

Fire-Detection-Siglip2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to detect fire, smoke, or normal conditions using the SiglipForImageClassification architecture.

⭐ 1 🌐 Public

Flood-Image-Detection

Flood-Image-Detection is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-512 for binary image classification. It is trained to detect whether an image contains a flooded scene or non-flooded environment. The model uses the SiglipForImageClassification architecture.

⭐ 2 🌐 Public

Florence-2-Image-Caption

This application utilizes the powerful Florence-2 vision-language model from Microsoft to generate comprehensive captions for images. The model is capable of understanding visual content and expressing it in natural language.

⭐ 6 🌐 Public

Endpoint Image Generation using Flux

⭐ 2 🌐 Public

Flux-Image-Captioner

FLUX.1-dev with Qwen2VL Captioner and Prompt Enhancer

⭐ 4 🌐 Public

Flux-Krea-multi-GPU-Pool

A Python-based multi-GPU image generation pipeline using Huggingface Diffusers with LoRA (Low-Rank Adaptation) support. This project distributes image generation workloads across all available GPUs on the system leveraging Python multiprocessing to optimize throughput and speed.

⭐ 1 🌐 Public

Experience the power of the FLUX.1-dev diffusion model combined with a massive collection of 255+ community-created LoRAs! This Gradio application provides an easy-to-use interface to explore diverse artistic styles directly on top of the FLUX base model.

⭐ 13 🌐 Public

Experience the power of the FLUX.1-dev diffusion model combined with a massive collection of 100+ community-created LoRAs! This Gradio application provides an easy-to-use interface to explore diverse artistic styles directly on top of the FLUX base model.

⭐ 4 🌐 Public

A Gradio-based web application for generating hyper-realistic images using FLUX.1-dev with Super Realism LoRA enhancement. This application provides an intuitive interface for creating high-quality, photorealistic images with customizable parameters and styles.

⭐ 16 🌐 Public

Flux-Sketch-Smudge-3to1

3:1 Best Image Gen

⭐ 1 🌐 Public

FLUX.1-Comparator-Krea-Dev

A high-performance Gradio application for comparing and generating images using two powerful FLUX.1 diffusion models: FLUX.1-dev-merged and FLUX.1-krea-merged-dev. This application provides an intuitive interface for AI-powered image generation with advanced customization options.

⭐ 2 🌐 Public

Flux.1-dev-4bit

FLUX.1-dev model with 4-bit quantization, quantized model maintains image quality while significantly reducing GPU memory requirements, making it accessible for users with limited hardware resources.

⭐ 1 🌐 Public

Food-101-93M is a fine-tuned image classification model built on top of google/siglip2-base-patch16-224 using the SiglipForImageClassification architecture. It is trained to classify food images into one of 101 popular dishes, derived from the Food-101 dataset.

⭐ 1 🌐 Public

Formula-Text-Detection

Formula-Text-Detection is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for binary image classification. It is built using the SiglipForImageClassification architecture to distinguish between mathematical formulas and natural text in document or image regions.

⭐ 2 🌐 Public

High Quality Image Generation Model - Powered with NVIDIA A100

⭐ 13 🌐 Public

Gemini-Image-Studio

A state-of-the-art image generation and editing tool powered by Google's Generative AI models. This React-based web application allows users to generate images from text prompts, edit existing images, or create images from hand-drawn sketches.

⭐ 3 🌐 Public

Gemini-Image-Studio-HF

A state-of-the-art image generation and editing tool powered by Google's Generative AI models. This React-based web application allows users to generate images from text prompts, edit existing images, or create images from hand-drawn sketches.

⭐ 4 🌐 Public

Gemma-3-Multimodal

Gemma 3 [ Image-text-text ] [ video inference ] [ multi image chat ]

⭐ 9 🌐 Public

Multiple Conditioned Image Generation, SDXL, Low-rank adaptation Refined

⭐ 4 🌐 Public

Gen-Vision-Multimodal

image, text, image-text-text, ocr

⭐ 2 🌐 Public

Gender-Classifier-Mini

Gender-Classifier-Mini is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify images based on gender using the SiglipForImageClassification architecture.

⭐ 1 🌐 Public

Geometric-Shapes-Classification

Geometric-Shapes-Classification is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a multi-class shape recognition task. It classifies various geometric shapes using the SiglipForImageClassification architecture.

⭐ 1 🌐 Public

GiD-Land-Cover-Classification

GiD-Land-Cover-Classification is a multi-class image classification model based on google/siglip2-base-patch16-224, trained to detect land cover types in geographical or environmental imagery. This model can be used for urban planning, agriculture monitoring, and environmental analysis.

⭐ 1 🌐 Public

Gliese-CUA-Tool-Call-8B-Demo

A Gradio-based demonstration for the prithivMLmods/Gliese-CUA-Tool-Call-8B model, a Computer Use Agent (CUA) specialized in GUI understanding and tool-calling actions.

⭐ 1 🌐 Public

Gliese-CUA-Tool-Call-8B-Localization-Demo

A Gradio-based demonstration for the prithivMLmods/Gliese-CUA-Tool-Call-8B model, specialized in GUI element localization. Users upload UI screenshots, provide task instructions (e.g., "Click on the search bar"), and receive predicted click coordinates in Click(x, y) format.

⭐ 1 🌐 Public

GLM-4.1V-9B-Thinking-Video-Understanding

GLM-4.1V-9B-Thinking, designed to explore the upper limits of reasoning in vision-language models. By introducing a "thinking paradigm" and leveraging reinforcement learning, the model significantly enhances its capabilities.

⭐ 3 🌐 Public

Chat Response Documentation

⭐ 2 🌐 Public

MS Word Like Content Creation System

⭐ 2 🌐 Public

Layout for Seamless Image Assembly

⭐ 1 🌐 Public

Gemma with Questions

⭐ 1 🌐 Public

gemma with questions

⭐ 1 🌐 Public

Gym-Workout-Classifier-SigLIP2

Gym-Workout-Classifier-SigLIP2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224

⭐ 1 🌐 Public

Hand-Gesture-2-Robot

Hand-Gesture-2-Robot is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to recognize hand gestures and map them to specific robot commands using the SiglipForImageClassification architecture.

⭐ 1 🌐 Public

Herculis-CUA-GUI-Actioner-4B-Demo

Demo: Herculis-CUA-GUI-Actioner-4B is a Computer Use Agent (CUA) multimodal model designed for GUI understanding, UI localization, and action execution across web, desktop, and mobile environments

⭐ 1 🌐 Public

HF-POSTS-RECEIPT

hf receipt, bs4, jinja2

⭐ 0 🌐 Public

Hindi-Sign-Language-Detection

Hindi-Sign-Language-Detection is a vision-language model fine-tuned from google/siglip2-base-patch16-224 for multi-class image classification. It is trained to detect and classify Hindi sign language hand gestures into corresponding Devanagari characters using the SiglipForImageClassification architecture.

⭐ 1 🌐 Public

Hospital-Management-System

Hospital Management System Using StreamLit Application

⭐ 10 🌐 Public

How-to-run-huggingface-spaces-on-local-machine-demo

Running Hugging Face Spaces on a local machine / colab T4 GPU involves several steps. Hugging Face Spaces is a platform to host machine learning demos and applications using Streamlit, Gradio, or other frameworks.

⭐ 21 🌐 Public

Huggingface-Android-Application

URL to App Conversion

⭐ 14 🌐 Public

Human-Action-Recognition

Human-Action-Recognition is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for multi-class human action recognition. It uses the SiglipForImageClassification architecture to predict human activities from still images.

⭐ 3 🌐 Public

Human-vs-NonHuman-Detection

Human-vs-NonHuman-Detection is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify images as either human or non-human using the SiglipForImageClassification architecture.

⭐ 1 🌐 Public

HunyuanOCR-Demo

A Gradio-based demonstration application for the Tencent HunyuanOCR model, focused on optical character recognition (OCR) tasks such as text detection, extraction, and coordinate formatting from images. Users can upload images, customize prompts (e.g., for Chinese/English text).

⭐ 2 🌐 Public

Image-Captioning-Salesforce-Blip

The BlipProcessor and BlipForConditionalGeneration are likely classes specific to a model called "Blip," which seems to be a transformer-based model for conditional text generation.

⭐ 7 🌐 Public

Image-Guard-2.0

⭐ 0 🌐 Public